HiCuT: An efficient and low input method to identify protein-directed chromatin interactions

3D genome organization regulates gene expression, and disruption of these long-range (>20kB) DNA-protein interactions results in pathogenic phenotypes. Chromosome conformation methods in conjunction with chromatin immunoprecipitation were used to decipher protein-directed chromatin interactions. However, these methods required abundant starting material (>500,000 cells), sizable number of sequencing reads (>100 million reads), and elaborate data processing methods to reduce background noise, which limited their use in primary cells. Hi-C Coupled chromatin cleavage and Tagmentation (HiCuT) is a new transposase-assisted tagmentation method that generates high-resolution protein directed long-range chromatin interactions as efficiently as existing methods, HiChIP and ChIA-PET, despite using 100,000 cells (5-fold less) and 12 million sequencing reads (8-fold fewer). Moreover, HiCuT generates high resolution fragment libraries with low background signal that are easily interpreted with minimal computational processing. We used HiCuT in human primary skin cells to link previously identified single nucleotide polymorphisms (SNPs) in skin disease to candidate genes and to identify functionally relevant transcription factors in an unbiased manner. HiCuT broadens the capacity for genomic profiling in systems previously unmeasurable, including primary cells, human tissue samples, and rare cell populations, and may be a useful tool for all investigators studying human genetics and personalized epigenomics.


Introduction
The structure and function relationship of the 3D genome organization remains a fundamental question in biology. 3D genome dynamics and functions during the cell cycle, development, gene transcription, and signalling have been studied in multiple cell types [1,2]. Disruption of the 3D genome results in distinct pathogenic phenotypes, including malformation of the skull and bones [3,4]. However, assessing 3D genome dynamics in human tissues, primary cells, and other rare cell populations has been limited.
Protein-DNA interactions are the basic unit of genome organization. Current methods to detect long-range chromatin interactions mediated by a specific protein factor include Hi-C sequencing coupled with chromatin immunoprecipitation-sequencing (ChIP-seq), chromatin interaction analysis by paired-end tag sequencing (ChIA-PET), proximity ligation-assisted ChIP-seq (PLAC-seq), and HiChIP [5][6][7][8][9]. These methods generally perform standard Hi-C followed by chromatin immunoprecipitation to capture DNA-protein complexes. Chromatin immunoprecipitation relies on non-specific chromatin fragmentation and immunoprecipitation which contributes to high background noise and a low signal-to-noise ratio. These methods require large amounts of starting material (500,000-100 million cells), sequencing reads (100-500 million reads per sample), and elaborate computational data processing algorithms to reduce background noise. A recently developed transposase-mediated tagmentation method (CUT&Tag) uses an enzyme-tethering strategy to improve capture of DNA-protein complexes, thereby increasing assay sensitivity and specificity and reducing the starting material required [10,11]. Here, we describe Hi-C Coupled chromatin cleavage and Tagmentation (HiCuT), a Hi-C tagmentation strategy that provides efficient and high-resolution protein directed long-range chromatin interactions from 100,000 cells and 12 million sequencing reads per sample. Relative to current methods, this assay reduces the starting material requirement by more than 5-fold, the sequencing depth requirement by 8-fold, and sample processing time by 50%. sequencing adapters (Fig 1A and 1B). This tagmentation step captures protein-DNA complexes and simultaneously prepares DNA fragments for sequencing library amplification. After sequencing, we process our data using the HiC-Pro pipeline to identify informative unique paired-end tags [14].
We performed HiCuT using antibodies against CTCF, a well-established transcription factor that regulates 3D nuclear architecture, in 100,000 cells of the human B lymphocyte GM12878 cell line (S1 Table). HiCuT generated highly reproducible datasets with a Spearman correlation coefficient >0.7 between the biological replicates. We pooled three samples to create a 300,000 cells dataset for downstream analysis (S1A Fig). Mapped reads were strongly enriched at CTCF-binding sites identified by ENCODE GM12878 ChIP-Seq datasets, confirming the high specificity of the HiCuT assay towards profiling DNA binding factor occupancy (S1B, S1C, and S1G Fig) [15]. To assess the specificity of protein mediated interactions detected using HiCuT, we processed HiCuT data using the HiC-Pro pipeline. We retained long-range chromatin interactions between 20kb to 2Mb, with at least one end overlapping with a known published ENCODE CTCF ChIP-seq peak (S2 Fig and S1 Table).
Given the low cell number and decreased sequencing depth, we expect HiCuT to largely detect high frequency chromatin interactions. We compared interactions obtained from HiCuT to published loop calls from a validated and heavily referenced Hi-C dataset generated from~125 million GM12878 cells and~6.5 billion paired end reads [13]. Matched loops shared both loop anchors. Our HiCuT data set captured 52% of all loops called from Hi-C data [13]. This overlap is comparable to the 42% overlap observed between the Hi-C loops and a published 2 million cell HiChIP dataset, despite the HiChIP dataset being generated from 7-fold more cells and 18-fold more sequencing reads [16] (Figs 2A and 2B and S1D, and S1 and S2 Tables). We visualized our HiCuT data on juicebox and overlaid the interactions onto a Hi-C map (Figs 2C, S1E, and S3A-S3C) [13,15,17,18]. In the juicebox panel, top-half of the map displayed the HiCuT data, where the black boxes identify long-range loops, and the bottom-half of the map displayed long-range loops called by the published HiCCUPS method (open blue boxes) [13]. The captured loops strikingly mirror each other. Thus, HiCuT captured most, if not all, of the major identified Hi-C loops (S1E Fig). Next, we used aggregate peak analysis (APA) to quantify the aggregate enrichment of the entire set. We aggregated HiCuT and HiChIP interaction counts over pairwise combination of CTCF-ChIP peaks falling within a 5kb to 1Mb distance interval. Compared to HiChIP, CTCF HiCuT datasets generated higher center enrichment and APA scores (S1F Fig). Importantly, 80% of HiCuT interactions colocalize with a validated CTCF ChIP-Seq peak, indicating that the HiCuT interactions are highly specific (S1 Table).
CTCF frequently colocalizes with the cohesin protein complex, and a SMC1a cohesin HiChIP dataset generated from 25 million GM12878 cells also had a 38% overlap against the Hi-C data set, which was similar to our 52% CTCF HiCuT overlap (S3D and S3E and S4A and S4B Figs) [7]. Finally, CTCF HiCuT data appropriately identified published in-situ HiC and DNA fluorescence in situ hybridization (DNA-FISH) validated long-range CTCF-mediated loops (Figs 2D, S3F-S3I, and S4C) [13]. Taken together, CTCF HiCuT identifies long-range interactions as effectively as existing methods. Importantly, compared to a full loop calling pipeline like HiCCUPS, HiCuT requires only minimum processing of interactions to reliably detect high frequency chromatin contacts.
We extended our analysis and performed HiCuT using antibodies against RNA Polymerase 2 (Pol2) in GM12878 cells. HiCuT generated highly reproducible datasets and mapped reads were strongly enriched at Pol2-binding sites identified by ENCODE GM12878 ChIP-Seq datasets (S5A- S5D Fig). HiCuT identified~106,000 long-range interactions (Fig 3A and S1 Table). We compared our dataset to a published Pol2 chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) dataset generated from 100 million GM12878 cells [8]. Our 300,000 cell HiCuT dataset captured 84% of the identified interactions, despite using 325-fold less starting material (100 million cells in ChiA-PET) and~40-fold fewer sequencing reads ( Fig 3A and S2 Table). We visualized both datasets onto a Hi-C map and WashU browser, and most, if not all, of the major interactions are detected by the HiCuT method (Figs 3B and 3C, S3J, and S5E) [8]. Compared to ChIA-PET, Pol2 HiCuT datasets generated higher center enrichment and APA scores (S5D Fig). Moreover, 63% of HiCuT interactions also colocalized with a validated Pol2 ChIP-Seq peak (S1 Table). Thus, HiCuT robustly captures long-range chromatin interactions for multiple proteins.
Existing methods to detect long-range interactions have limited use in human primary cells, where input cell number is restricted. To determine the efficacy of HiCuT to overcome this limitation, we performed HiCuT in 100,000 primary human keratinocytes using antibodies against histone 3 lysine 27 acetylation (H3K27ac), a well-known epigenetic mark of active enhancers (S6A- S6D Fig). In our pooled 300,000 cell dataset, HiCuT identified~76,000 longrange interactions (Fig 4A and S1 Table). We compared these interactions with known single nucleotide polymorphisms (SNPs) identified in GWAS studies on human inflammatory skin diseases (NHGRI-EBI catalog, EFO_0000676). 725 interactions overlapped with known SNPs, and 343 of those interactions had one anchor originating from a gene promoter (Fig 4A and  S3 Table). These candidate genes were further analysed by gene ontogeny using EnrichR, and two of the top 6 hits were related to inflammatory skin diseases, with psoriasis being the top hit ( Fig 4B) [15,17,19]. HiCuT also appropriately captured an established and validated locus containing multiple long-range SNP-gene interactions (Fig 4C) [20]. Thus, HiCuT linked previously identified SNPs to potential candidate genes (S6E Fig). Finally, we computationally interrogated all identified anchor sequences for over-represented transcription factor binding sites, and the top 4 hits of inferred transcription factors include: p63, Fra-2, p73, and ZFX, all well-established mediators of keratinocyte function (Fig 4D) [21][22][23][24]. As a negative control, similar analysis of a HiCuT H3K27ac dataset from GM12878 cells revealed a different nonoverlapping set of inferred transcription factors (Figs 4D and S6F).

Discussion
In this study, we generated high-confidence protein-directed chromatin interaction profiles from 100,000 cells using antibodies against CTCF, RNAPol2, and the active enhancer mark H3K27ac. We found comparable efficiency in detection of long-range interactions when we benchmarked HiCuT against published datasets that used recommended cell numbers of existing techniques. As expected, HiCuT produced fewer unique valid interactions compared to existing methods, because we start with fewer cells and sequencing reads (S1 Table). However, the percentage of unique valid interactions and the cis:trans interaction ratios remain in-line with existing methods.  [15]. Maximum intensity is indicated in the lower right of each panel. (D) GM12878 Hi-C contact map with published location of in situ DNA FISH probes (P1 to P3) previously used to verify a chromatin loop on the chromosome 17 (blue box) [13]. CTCF Hi-CuT interactions (black boxes) are superimposed onto this map. CTCF ChIP tracks are from ENCODE (GSM822312) [13,15]. Maximum intensity is indicated in the lower right of each panel. https://doi.org/10.1371/journal.pgen.1010121.g002 One of the most striking aspects of HiCuT is how high frequency protein mediated long range interactions are easily obtained and interpretable with minimal computational processing, as there is extremely limited nonspecific background signal (as illustrated in S2 Fig). In fact, 60-80% of mapped long-range HiCuT interactions fall specifically within known ChIP peak sites, highlighting the specificity of this assay (S1 Table). This reduced background noise allows HiCuT datasets to forgo the use of loop calling programs or filtering pipelines required by existing assays, and this also reduces the number of sequencing reads required per sample (S1 Table and S2 Fig). Several aspects of our protocol highlight how this is possible: 1) Tagmentation generates highly specific and small amounts of DNA fragments. We achieve high resolution for loop origins. 2) We omit the Hi-C biotin pulldown step to capture more DNA, which minimizes PCR bias during the amplification step of sequencing library generation. Omitting this step also did not introduce additional noise as the percentage of long-range reads compared to total reads remains similar between HiCuT and HiChIP (S1 Table). 3) We simply use the distance threshold on the HiC-Pro pipeline output to select for long-range interactions. These interactions remain "unprocessed" in contrast to loop filtering algorithms, which may differ substantially between each other based on parameter definitions and strategies. Individual loop filtering algorithms analysing the same dataset identify different numbers of loops [25]. 4) We utilize the Hi-C 3.0 protocol which included the use of double fixation and double restriction enzyme digest and shown to be 2-fold more sensitive in detecting long range chromatin loops than the Hi-C 2.0 protocol, which all existing methods used [12,26]. With these limited steps, we enrich for long range interactions at a similar level of efficiency to existing methods despite less starting material, fewer sequencing reads, and minimal computational processing. Finally, HiCuT simultaneously generates high-quality DNA binding data.
80% of HiCuT CTCF interactions originate from a known protein-binding site, and HiCuT CTCF peaks match~30% of published ChIP-Seq and CUT&RUN peaks (S1G Fig and S1 Table). This offers investigators profiling long range protein directed interactions added information about DNA protein occupancy without additional investment. In past few years, successful attempts were made to lower the amount of starting material for Hi-C based assays [27,28]. In particular, single cell Hi-C (scHi-C) detects long-range interactions from single cells [29,30]. One weakness of scHi-C, shared by all single-cell methods, is that the depth of potential interactions detected is extremely limited. Protein directed population methods permit a deeper characterization of long-range interactions. Tagmentation has enabled researchers to generate ChIP-Seq comparable profiles more efficiently and from smaller cell populations [31,32]. Our individual 100,000 cell CTCF HiCuT replicates captured on average~45% of the combined 3 sample dataset (S1D Fig). Future studies are needed to further scale down this method to lower cell numbers or even single cells.
HiCuT and other protein-directed 3C assays frequently detect more long-range interactions compared to traditional Hi-C. This difference in interactions may be due to insufficient sequencing depth of the Hi-C assay or protein-directed methods offer additional fidelity due to the enrichment of specific binding sites. One limitation of all protein-directed assays is that we cannot compare the capture frequencies of enriched regions to non-bound or nonenriched sequences. While we cannot rule out the possibility that these additional interactions are background noise, we note that HiCuT does not require loop calling algorithms, and the majority of the additional interactions fall in verified protein binding sites (S1 Table). Thus, the functional relevance of these additional interactions will need further experimental validation. In conclusion, we present HiCuT, a rapid, low input, and cost-effective method to generate genome-wide chromatin interaction maps from 100,000 cells and 12 million reads per sample. In addition to assessing 3D genome architecture, potential applications for HiCuT in primary human tissues include functionally linking previously identified SNPs to disease causing genes and the unbiased identification of functionally relevant transcription factors. The use of HiCuT with H3K27ac will allow detection of enhancer promoter interactions without knowing a protein factor a priori. HiCuT bypasses limitations of existing chromatin interacting methods and broadens the capacity for genomic profiling in systems previously unmeasurable, including primary cells, human tissue samples, and rare cell populations.

Cell culture and antibodies used
We used two different cell types, surface adherent primary keratinocytes and surface nonadherent floating GM12878 cells. GM12878 cells were provided by Dr. A. Raj (University of Pennsylvania). GM12878 cells were cultured in RPMI (Thermo-Fisher Scientific, Cat.No: 11875-085), supplemented with 10% fetal bovine serum and 50U of penicillin and streptomycin (Thermo-Fisher Scientific, Cat.N: 15070-063). Primary keratinocytes were provided by University of Pennsylvania Department of Dermatology, Skin Biology and Diseases Resourcebased Center (SBDRC). Cells were grown in supplemented 50:50 keratinocyte media, a 50:50 mixture of keratinocytes-SFM (Thermo Scientific) and Medium 154 (Thermo Scientific). Cells were grown at 37˚C and 5% CO2.

Cell lysis and nuclei fixation
We fixed 100,000 cells in 0.5 mL of freshly made 1% formaldehyde solution at room temperature for 10 minutes. To quench the formaldehyde, we added glycine to a final concentration of 200 mM for 5 minutes at room temperature and then 15 minutes on ice. Cells were washed once with 0.05% BSA in PBS and spun down at 2,000 g for 5 minutes. We fixed cells for a second time with 3mM DSG (final concentration) in 500 μL PBS, at room temperature for 40 minutes, on rotation. We added glycine at a final concentration of 0.4 M for 5 minutes. Cells were washed once with 0.05% BSA in PBS and spun down at 2,000 g for 5 minutes. We resuspended cells in Hi-C lysis buffer (10mM Tris-HCl pH8.0, 10mM NaCl, 0.2% Igepal CA630, 1X protease inhibitor) and incubated them on ice for 30 minutes. We spun down the cells at 2,500 g for 5 minutes and washed the nuclei once with NEBuffer 3.1.

In situ contact generation
In situ contacts were generated according to the in situ Hi-C protocol with minor modifications. We resuspended nuclei in 161 μL of 1x NEBuffer 3.1 and permeabilized them by adding 19 μL of 1% SDS and incubating the mixture for 10 minutes at 65˚C without shaking. Immediately afterwards, we placed the tube on ice. We quenched the SDS by adding 21.5 μL of 10% Triton X-100 and incubating the samples at 37˚C for 15 minutes with shaking at 900 rpm. Next, we added 20 μL of 10U/μL DdeI, 4 μL of 50U/μL DpnII, and 2 μL of 1x NEBuffer 3.1 and mixed gently by pipetting. The mixture was incubated for 3 hours or overnight at 37˚C on a thermomixer at 900 rpm, in 30 seconds on, 4 minutes off mode. After digestion, we inactivated the enzymes at 65˚C for 20 minutes with no shaking. To fill in restriction fragment overhangs, we added 35 μL of end-filling master mix: 18.75 μL of 0.4 mM dATP; 0.75 μL of dTTP, dGTP, and dCTP at 10 mM each; and 5 μL of DNA polymerase I (NEB, M0210). We rotated the samples for 2-3 hours at 37˚C. We ligated the DNA fragments by adding 332.5 μL of ligation master mix containing: 60 μL of 10X NEB T4 DNA ligase buffer (NEB, B0202), 50 μL of 10% Triton X-100, 6 μL of 10 mg/mL BSA, 2.5 μL of 400 U/μL T4 DNA Ligase (NEB, M0202), and 214 μL of water. Samples were rotated end over end at room temperature for 2-3 hours. The nuclei were pelleted and washed once with 200 μL of exchange buffer (20 mM HEPES-KOH pH 7.9, 10 mM KCl, 0.1% Triton X-100, 20% Glycerol, 0.5 mM Spermidine, 1x EDTA-free Protease Inhibitor). The proximity ligated nuclei were resuspended in 100 μL of exchange buffer.

Chromatin cleavage and tagmentation
We washed 10 μL of Concanavalin-A Beads two times with 100 μL of bead activation buffer (20 mM HEPES, pH 7.9, 10 mM KCl, 1 mM CaCl2, 1 mM MnCl2) and resuspended them in 100 μL of cold bead activation buffer. Beads were added to 25ul of the mixture, and the mixture was incubated at room temperature for 10 minutes. The tubes were placed on a magnetic stand, and the supernatant was removed.
We added 50 μL of cold antibody buffer (20 mM HEPES pH 7.5, 150 mM NaCl, 0.5 mM Spermidine, 1x EDTA-free Protease Inhibitor, 0.01% Digitonin, 2 mM EDTA) and 1 ug of the appropriate primary antibody (RNA Polymerase II (Cell Signaling Technology: Cat. 2629), H3K27ac (Active Motif: Cat. 39133), and CTCF (Cell Signaling Technology: Cat. 3418). Samples were incubated for 2 hours at room temperature or overnight at 4˚C on a rotating platform. Next, we placed the samples on a magnetic stand and removed the supernatant. 50 μL of cold low-salt digitonin buffer (20 mM HEPES pH 7.5, 150 mM NaCl, 0.5 mM Spermidine, 1x EDTA-free Protease Inhibitor, 0.01% Digitonin) and 0.25 ug of secondary antibody were added (Anti-Rabbit (EpiCypher: Cat. 13-0047) and Anti-Mouse (EpiCypher: Cat. 13-0048)). Samples were incubated at room temperature for 30 minutes and then washed twice with 200 μL of cold low salt digitonin buffer. Next, 50 μL of ice-cold high salt digitonin buffer (20 mM HEPES pH 7.5, 300 mM NaCl, 0.5 mM Spermidine, 1x EDTA-free Protease Inhibitor, 0.01% Digitonin) and 2.5 μL of CUTANA pAG-Tn5 (20x stock from EpiCypher) were added, and samples were incubated at room temperature for 1 hour. They were washed twice with 200 μL cold digitonin high salt digitonin buffer. Next, samples were resuspended in 50 μL of cold tagmentation buffer (20 mM HEPES pH 7.5, 300 mM NaCl, 0.5 mM Spermidine, 1x EDTA-free Protease Inhibitor, 10 mM MgCl2) and incubated at 37˚C for 1 hour. We removed the supernatant and resuspended the samples in 50 μL of Release Buffer (10 mM TAPS pH 8.5, 0.2 mM EDTA 0.5% SDS, 22.5mM EDTA and 1uL of 20mg/mL Proteinase K). Samples were incubated at 58˚C for 1 hour and 68˚C for 2 hours in a thermocycler. After incubation, supernatant was collected and purified with Zymo Research ChIP DNA Clean & Concentrator protocol (Cat No. D5210), as per manufacturer's recommendation. We eluted the PCR-ready HiCuT libraries in 21 μL volume.

Library amplification
We added to each sample, 2 μL of universal i5 primer, 10 μM of barcoded i7 primers (EpiCypher), and 25 μL of CUTANA High Fidelity 2x PCR Master Mix (EpiCypher). Primer sequences are listed in S4 Table. The following PCR settings were used: 58˚C for 5 min, 72˚C for 5 min, 98˚C for 45 sec, then cycle at 98˚C for 15 sec, 60˚C for 10 sec, and 72˚C for 1 min. Samples were amplified for 18 cycles. Size selection was performed using Ampure XP beads following the manufacturer's recommendation. Libraries were eluted in 15 μL of elution buffer (Qiagen) and quantified using both a Qubit fluorometer and qPCR against Illumina primers. Libraries were sequenced in 75 bp paired-end sequencing format (Illumina NextSeq). Processing time for this protocol is 1.5 days.

HiCuT data analysis
Paired-end reads were aligned independently to the hg19 human genome using bowtie2 (global parameters:-very-sensitive-L 30 -score-min L,-0.6,-0.2 -end-to-end-reorder; local parameters:-very-sensitive-L 20 -score-min L,-0.6,-0.2 -end-to-end-reorder) through the HiC-Pro software [14]. The valid pair file was used for downstream analysis. Trans and < 1kb interactions were filtered out. For comparative analyses, we identified chromatin interactions that fall between a 20Kb to 2Mb window that have at least one anchor of the interacting pair falling in the ChIP peak region. Following the ChIP peak overlap and distance thresholding, we consider each valid pair as an individual interaction. If two overlapping valid pairs were present, we considered them as two interaction points. The QC metrics and the final interactions are provided in S1 and S2 Table. For comparison to our HiCuT datasets, we used published loop coordinates from the following sources: GM12878 HiChIP Smc1a: GSE80820; GM12878 HiChIP CTCF: GSE115524; GM12878 HiC HICCUP loops: GSE63525; and GM12878 ChIA-PET RNA Polymerase2 loops: GSM1872887. The GM12878 ChIA-PET RNA Polymerase2 interacting pairs for APA analysis were taken from 4DN Data portal (4DNESZ25M0ZV) [35].

Scatterplots and correlations
The fastq files from HiCuT samples were aligned to the hg19 reference genome using bowtie2. Following conversion from.sam to.bam format, the.bam files were processed using bamcompare from deepTools2.0 with default settings. The output was used to calculate Spearman correlation between replicates using plotCorrelation tool from deepTools2.0 [34].

Visualization of HiCuT interactions
The HiCuT interactions and ChIP-Seq peaks from ENCODE were visualised using Juicebox and WashU Epigenome legacy browser [18,36]. The HiC maps in the Juicebox images were taken from the Juicebox archive for respective cell lines.

Motif analysis
We identified H3K27ac HiCuT anchors which do not fall within +/-2500 bp of known Refseq promoters or transcription start sites (from UCSC table browser). Since HiCuT interactions are 1 bp in size, the filtered anchors were extended 100 bp on each side [37]. We used HOMER Motif Analysis software (http://homer.ucsd.edu/homer/motif/).

SNP analysis
The SNP coordinates were taken from GWAS dataset (Id: EFO_0000676), downloaded from NHGRI-EBI Catalog of human genome-wide association studies (https://www.ebi.ac.uk/gwas/ home). Since our primary keratinocyte H3K27ac HiCuT interactions are 1 bp in size, we applied a 5kb window using bedtools v2.30.0 window function to identify HiCuT interactions falling in the vicinity of a SNP locus. The SNP matched interactions were then mapped to +/-2500 bp of known Ref-seq promoters (from UCSC table browser). We identified 343 unique genes (S3 Table).

Aggregate peak analysis (APA)
APA plots were generated for the following sets of interactions and loci: For each set, loci over which to plot interactions were determined by taking all pairwise combinations of peaks which were at least 5,000 bp away but less than 1 MB away were created. Out of this list, 200,000 random loci pairs were chosen per APA plot. APA plots were generating using the apa function of the software package juicer (v 1.6) [38]. Data normalized using Knight-Ruiz balancing was plotted. for HiCuT and HiChIP. Captured interactions were between 20Kb-2Mb in length, with at least one anchor overlapping with a known CTCF ChIP-seq peak. The right panel displays final long-range interactions after looping calling programs were performed for HiChIP and HiC datasets. HiCuT did not require additional filtering. ChIP-Seq tracks are obtained from ENCODE GM12878 dataset (GSM733752), and gene names are listed below [15,17]. Chr, Chromosome (TIF) S3 Fig. Raw Hi-C plots from juicer. In situ GM12878 Hi-C data maps generated from juicer [13,18,38].  [7]. (B) GM12878 Hi-C contact maps at two different loci. Hi-C dataset at 5kb resolution superimposed with HiCuT interactions (top panels, upper right, black boxes), GM12878 Hi-C HiC-CUPS loops (all panels, lower left, open blue boxes) and GM12878 SMC1a HiChIP loops (lower panels, upper right, black boxes) (GEO GSE80820). Maximum intensity is indicated in the lower right of each panel [7]. (C) HiCuT captures previously published microscopically validated loops [13]. GM12878 Hi-C contact map with superimposed location of different DNA FISH probes. The blue probes (P1 and P2) were shown to interact in a DNA FISH experiment (blue rectangle), and HiCuT detected this interaction [13]. The green boxes represent non interacting regions between FISH probes (P1 to P3, green boxes). GM12878 CTCF Hi-CuT interactions (black boxes) superimposed on the in situ Hi-C map. Maximum intensity is indicated in the lower right of each panel.  [15,17]. (D) APA plots for H3K27ac HiCuT around pairs of H3K27ac-binding sites from NHEK cells (GSM733771) [15,17]. (E) WashU Epigenome browser view of two different genomic regions highlighting protein-directed chromatin interactions. The NHEK H3K27ac ChIP tracks are from ENCODE (blue) followed by location of SNPs associated with inflammatory skin diseases (red, NHGRI-EBI catalog, EFO_0000676) and chromatin interactions identified by H3K27ac

Supporting information
HiCuT assay (red loops). Chr, Chromosome. (F) Comparison of H3K27ac-mediated longrange interactions in primary keratinocytes and GM12878 cells. The H3K27ac ChIP tracks are from ENCODE NHEK cells (GSM733771) followed by H3K27ac HiCuT interactions in primary keratinocytes or GM12878 cells. [15,17] (TIF) S1 Table. Tables represent data metrics of HiCuT and SRA datasets. The valid interactions and unique valid interactions were obtained from HiC-Pro. The final column describes the number and percentage of interactions that fall within the respective ChIP peaks taken from ENCODE. (XLSX) S2 Table. Table represents